MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
Cambridge, Massachusetts 
Project MAC 


Artificial Intelligence Project Memorandum MAC-M-165 

Memo 70 June 25, 1964 


Hash-Coding Functions of a Complex Variable 
by William A. Martin 


ABSTRACT 


A common operation in non-numerical analysis is the comparison 
of symbolic mathematical expressions. Often equivalence under 
the algebraic and trigonometric relations can be determined with 
high probability by hash-coding the expressions using finite field 
arithmetic and then comparing the resulting hash-code numbers. 

The use of this scheme in a program for algebraic simplification is 
discussed. 
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I. Introduction 

The elementary functions of a comples variable are those which can 
be expressed by the following recursive scheme. Any complex constant or 
variable will be called an expression ; if u and v are expressions, then 
so are u+v, u*v, u , e , -u, and 1/u. The trigonometric and hyperbolic 
functions may be expressed explicitly. Because of the defining relations 
of the complex field and the trigonometric identities, there are infinitely 
many expressions for any given function. Two expressions will be said to 
be equivalent if they represent the same function. 

Existing schemes for expression comparison use the defining relations 
along with some additional conditions to put each expression in a canonical 
form. If the canonical forms of two expressions are identical, they must 
represent the same function. This method has certain drawbacks. First ,f 


putting the expression in a canonical form requires the comparison of many 
subparts of the expression with each other. In particular, the commutative 
law requires that the terms in sums and products be sorted. Second, dis¬ 
covery that two expressions are equivalent requires a comparison of every 
subpart of one with the corresponding subpart of the other. Third, it is 
very difficult to reduce all equivalent expressions to one compact canoni¬ 
cal form. None of the existing schemes does this. 


This memo explores a probabilistic approach. Suppose F(z) t G(z) 

(F and G are elementary functions), then F(z) - G(z) = 0 has, at most, a 
countable number of solutions, while the complex numbers are uncountable. 
Therefore, the probability that F(z) - G(z) = 0 for a point z chosen at 
randum is 0. Thus, it would be possible to test for equivalence of expres¬ 
sions by comparing their values at a randomly selected point. It is pos¬ 
sible to get some approximation to this fact with the finite arithmetic 
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of a computer. 

One method would be to substitute a random floating point for 

each occurrence of each distince variable and then evaluate the resulting 

expression using floating point arithmetic. This method is limited by 

overflow and roundoff error. For example, if x is a floating point number 

chosen at random from a flat, distribution, then with probability one half 

is larger or smaller than all the floating point numbers; it does not 

2 

appear possible to find a rule for mapping x back into the floating point 
numbers such that the code numbers pf equivalent expressions will be very 
nearly the same. This overflow is difficult to avoid by restricting the 

v 

initial choice of floating point numbers since expressions of the form u 
are allowed. Furthermore, if two expressions, x and y are of different 
orders of magnitude, then, because of roundoff error, x + y may evaluate 
to either x or y. This is a particular disadvantage since it is likely that 
an expression will be compared with subparts of itself* The same problems 
arise with a floating point approximation to the complox numbers. 

One possible answer, which we investigate here, is to use a finite 
field, instead of the infinite field of real numbers. 

II. Finite Fields and the Exponent Arithmetic 
a. Finite Fields 

The use of floating point numbers in the code number scheme is limited 
because the sum or product of two floating point numbers is not necessarily 
a floating point number. This problem is avoided if a finite field, F, is 
used, since the field can be chosen small enough so that every element can 
be represented by a computer number. The task is to choose F such that 
expressions which are equivalent in the complex numbers are also equivalent 
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in it. That is, we need a homomorphism from the complex numbers onto F. 
We now develop a field which meets this requirement in many, but not all, 

cases. 

An abelian group G is a set of elements with an operation x and an 
identity element e such that: 


1. 

a 


G, 

b e G 

then a x b £- G 

2. 

a 

e 

G, 

then ae 

= ea = a 

3. 

a 


G, 

then d 

-1 -1 -1 
a € G 3 aa ^ = a "'a = e 

4. 

a 


G, 

b £ G, 

then ab = ba 


A finite field F is a finite set of elements with an operation + under 
which the elements of F form a group with identity 0 (the additive group) , 
and an operation * under which the elements of F 1 = F - 0 from a group with 
identity 1 (the multiplicative group). In addition the relations 
a• (b + c) = a*b 4- a-c and a-0 = 0 

hold. 

If m and n are integers and p is a prime integer then c - (m-hi) mod p 
means that c equals the remainder of (m4n)/p. Multiplication mod p is 
defined similarly. It can be verified that the integers less than a prime 
from a finite field under the operations addition and multiplication mod p. 
The additive inverse of 1, -1 is seen to be p-1 since p-1+1 = 0 mod p. 

b. The element i 

In the complex field there is an element i such that i*i = -1, so 
such an element is also required in F. To see how this restricts the 
choice of p one needs the fact found in the references that the multiplica¬ 
tive group F' of a finite field is cyclic. This means that there is an 
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element a (called a generator) in F' such that every element in F' is 

2 v-2 

some power of a . In fact, F' can be written 1, a, a , ... , Ct and 

qP- 1 = 1. Since p is a prime it is odd and so ^ is an integer. 

(p-D/2 = -1 since a (p “ 1)/2 t 1 and (a (p “ 1)/2 ) 2 =1. If ^ is even 

then r = 2— is an integer such that if i * a , i = -l. Note that 
4 

either a or a T can be chosen as i and the other becomes -i. We have 
thus shown that F will have an element i if and only if p is of the form 
4q + 1. 

c. The Exponent Arithmetic 

In the complex numbers, one might have to test for the equivalence of 
two expressions such as u V ^ and u . u, where the exponent arithmetic is also 
performed in the complex numbers. However, since the multiplicative group 
is a cyclic group with one less element than F, v + 1 must be computed 
mod (p - 1). Since an isomorphism does not exist between the additive 
and multiplicative groups of F, the exponent operations cannot be perfromed 
in it. This failure of the finite field evaluation to be recursive in the 
exponent direction is a serious limitation. Furthermore, since p - 1 is not 
a prime the exponent operations will not form a field. Fortunately, many 
expressions encountered in analysis have rather simple exponents and so 
much can be saved by evaluating the exponents in the E arithmetic which 
we now define. 

Let the basic elements of E be the integers less than p - 1. Addition 
and multiplication are mod (p - 1). It is easy to see that all elements 
have an additive inverse. No even integers have a multiplicative inverse, 

however, for this would imply: 

2•s* (2•s)’ 1 = 4.q.m + 1 
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2*(s*(2*s) - 2*q*m) = 1 

which is a contradiction since 1 has no divisors in the integers. If 
we take q prime, then the odd integers other than q have a multiplicative 
inverse as a consequence of the Euler theorem (see Ref., Albert, p.47): 

Let (Jf(m) be the number of integers g such that 0 < g m and 

g is prime to me. Then 

0(m) . , 

a = 1 mod m 

for every a prime to m. 

The failure of the even integers to have a multiplicative inverse 
means that u 1 ^ 2 + 1 ^ 2 will not evaluate to u. We thereform adjoin to the 
basic elements of E the element ^ , where 2p =1. Closing E under multi¬ 
plication and addition would require that all the elements of the form b 
(b an integer less than p - 1 and b odd) be in E. However, many cases 
can be covered if we allow only elements a or bp . 

d. Square Roots in F 

u * should evaluate to be the square root of u in F; however, only 

one half the elements in F have a square root, these are the even powers 

of the generator, a, of F*. Since (a n ) r is even for even n and any r only 

1/4 of the elemtns could have a square root computable by raising the element 

to some power, that is, by assigning some integer to ^ . However a method 

of finding the roots of this samller set can be found, as will be shown 

4n P 2n 

next, i.e., there exists a p such that if u = Q , then u' * a . 

If p is of the form 4q + 3, then since 4n(q +1) * 2n mod (4q + 2) n^ q 
one finds (a^ n ) q+1 = a 2n . q+1 is thereform the proper value for f 5 . The 
requirement that p be of the form 4q + 3 is unfortunately in conflict with 
the earlier requirement that p be of the form 4q + 1. By choosing 
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P - 8q' + 5 - 4(2q' + 1) + 1 one obtains by a si.mlar argument the square 
rooc of 1/8 of the elemnts with p = q ? + 1. 

e * Trigonometric Identities 

Define in the usual manner: 


sin 0 = 


cos 0 = 


i© -i© 
e - e 

2 i 


i© , -i© 

e + e 


sinh 0 = 


cosh 9 = 


© -0 
e - e 


© ^ -© 
e + e 


2 

2 


where e is an element of F yet to be chosen. Note that i £ -1 in the 

E arithmetic but this does not arise in taking sums and products of the 

above functions. For instance: 

. 2„ . 2.. f e 19 - e' 19 ' 2 + / e 19 + e’ 19 ' l 2 


sin 9 -f cos 0 =f 


/ 


- o i© -i0 -i© -i© i© i© i© -i© 

_ e e - 2 *e »e + e *e + e *e + 2e w 


* e 


-4 


= - 2-1 2 ♦ 1 

! . 


-4 


= 1 

It is necessary that e 1 ' 1 = -1 and 


+i. If e is to have a square root 
2n 


it must be an even power of a; taking e = a one obtains for p = 8q' +5 

ire 

e = -1 

2ni:n 4q 1 + 2 

o = a 

2nin = (4q’ +2) mod (8q’ + 4) 

nijr = (2q' + 1) mod (8q* + 4) ( 1 ) 

For any choice of n, e is determined, providing equation (1) can be 

n. Some reflection will show that trigonometric 


solved for the element 
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calculations may involve roots of e greater than 2. Suppose n is chosen 
odd, then one square root of e, a , has no square root, nor does -a n the 

other square root of e. That -a n does not have a square root is a conse¬ 

quence of the choice of -1 as a square. -a n = -1 a n = (square)(nonsquare) 
and a suqare times a nonsquare must be a nonsquare. From this one can see 
that if e is to have a 2 m th root, it must be chosen of the form 

Note that the choice of n divides the elements of F; between the roots and 

powers of e. 

Assuming that square roots of e will be of chief interest, we return 

to the solution of equation (1) for the case n = 1. A sufficient condi¬ 

tion for solution is that i be of the form 4r + 1, whence equation (1) 
yields n = 2q' + 1. Thus jt is equal to q and should be taken prime. 

It remains to verify tha: for the choices made the pairs (sin 0, 

cos 0) can be made to take on 4 tt = 8q' +4 distince pairs of values in 

F as 9 runs over the 6rr elements of E. Each pair occurs at most twice. 

Since i is odd, it is relatively prime to 4 tt unless i = n. As n is of 

the form 2q' + 1 and i of the form 4n + 1, i £ n if q' is odd. With i 
relatively prime to 4 tt the relation y = ix mod 4 tt has a unique solution 
x - i y for each x less than 4jr. The 6 n elements of E can be represented 
in the form x where x takes on the odd values 0 to 4n - 1 and all the value 

from 4-tt to 8n 1. Then e = Q 1X = c* y takes on all the values in F' and 

no value more than twice. It is easy to show that if 

x.. + l/x n x - 1/v 

a i= 1 1 b, = 1 i/x i 

1 ? 1 - T~- - 

^ 2 i 

and 

a, = X 2 + 1/x 2 b, - x 2 1/x 2 

2 1 IT- 

then x^ = x 2 , which completes the proof. 
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f• Summary of the Requirements 

1. p of the form 8q' + 5. 

2. q' odd as a sufficient condition. 

3. i of the form 4m + 1 as a sufficient condition. 

4. 2q' + 1 prime 

III. Machine Realization--Finding....a.Trime 

The requirements in section II can be rephrased as: 

1. p = 16n 4- 13 prime. 

2. rr = 4n + 3 prime. 

3. i of the form 4m + 1. 

Another requirement is: 

4. p less than 1/2 the largest machine integer. 

This allows addition without overflow. The multiplicative inverse of an 

X 2 

element a in F is found by noting that a = a^ . To raise an element a 

to any power we begin multiplying it by itself, creating the numbers 

2 1 

b^ = a , we then express the power as a binary number and add up the 
appropriate b^. This leads to the requirement: 

5. p - 2 should be expressed as a few powers of 2. 

To find a prime for the 7094 the following procedure was followed: 

29 

1. Beginning with n = 2 test if p = 16n + 13 is prime by dividing by 
every odd number up to 7?. 

2. Test if 4n + 3 is prime. 

2 

3. Find a generator of F'. (If a is not a generator then a = 1 or 

A IT 2 7t 

a = 1 or a 1 = 1 or a = 1. Raise a to these four powers by the scheme 
above.) Almost 1/2 of the elements are generators so one is quickly found. 
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in „ 4n+3 

4. Compute 

5. If i l is not odd compute a 2 = , and i = • i 2 will be odd. 

6. Check to see if this odd i is of the form 4n + 1. 

After two hours of computation this procedure resulted in: 
p = 8,589,949,373 
a = 13,560, 097 
e = 8,364,320,344 
it = 2,147,487,343 
i = 5,525,736,173 

Machine language routines were written to perform the operations in 
the E and F arithmetic. The evaluation of a code number is performed in 
the LISP language. The following conventions were followed: 

1. Multiplication by is indicated with the computer word minus sign. 

2. Recursion in the exponent direction follows the pattern F, E, F, E, etc. 

3. Floating point numbers in an expression are treated as rational num¬ 
bers and the corresponding F or E element is computed. 

2 

4. Whenever the element ^ appears, it is changed to op by multiplying by 

the integer assigned to P . 

\ 

IV. The Probability of Error 

Estimation of the probability of error is difficult. The average 
probability of error for certain subsets of expressions will differ from 
that for all expressions. No statistics are available on the expressions 
which will be encountered in practice. 

It is possible that two expressions which represent the same function 
will receive different code numbers because some exponent operation does 
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not preserve the equivalence. Study of section II should make clear under 
what circumstances this will happen. 

In the simplification program described in the next section, expres¬ 
sions with the same code number are considered equivalent. Therefore, an 
accidental match of non-equivalent expressions is very serious. If we 
could show that the operations in the E and F arithmetic mapped their sets 
of elements uniformly back onto themselves, then the probability of a 
match between two expressions selected at random from the set of all expres¬ 
sions would be 1/p. Unfortunately, this is not the case. In the F arith¬ 
metic the operations of multiplication and addition and their inverses do 
satisfy this criterion. Looking at the cyclic group F', however, one sees 
that raising any element in F except a multiple of tt to all powers 
will produce either 1/A, 1/2, or all the elements of F. Thus, exponentia¬ 
tion tends to map the elements into the 4th powers of a generator and so 
increase the probability of random match. 

The same bunching occurs in the E arithmetic. The distribution of 
elements after n operations can only be found using a rather complicated 
two dimensional convolution. The distributions after one operation 
shown in Figure 1 indicate that for moderately complicated exponents the 
probability of error should remain in control. 
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\ 


initial 

addition 

multiplication 


division 



relative 

frequency 


i 



odd integers even even Lp elements 

integers integers 
not divis- divisible 
ible by 4 by 4 


Distribution of elements in E after one operation. Within each classi¬ 
fication, the elements are ordered in the normal manner. 


Figure 1 
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V. The Simplification Program 

A simplification program has been written in LISP. The program 
collects terms in sums and products, removes unnecessary levels of 
parentheses, and recognizes identities involving 0 and 1. The explicit 
operations of division and subtraction have been removed from the LISP 
representation, instead addition of a negative quantity and exponentia¬ 
tion to a negative power are used. An attempt was made to exploit the 
similarity between the operations on the additive and the multiplicative 
groups. Some further improvement could be made. The program alters 
list structure so that common subexpressions are simplified only once. 

To recognize equivalent subexpressions the program uses the hash 
code just described; the hash code numbers are added to the front of the 
expressions. It would have been possible for all subexpressions to have 
hash code numbers on their property lists, however, because of limited 
storage the decision was made to retain only the numbers at the current 
level. This means that in certain situations it is necessary to recom¬ 
pute the numbers. The program appears to be faster and more powerful 
than those using canonical ordering. 
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